tg-me.com/CodeProgrammer/3768
Last Update:
๐-๐๐๐๐ง๐ฌ ๐๐ฅ๐ฎ๐ฌ๐ญ๐๐ซ๐ข๐ง๐ ๐๐ฑ๐ฉ๐ฅ๐๐ข๐ง๐๐ - ๐๐จ๐ซ ๐๐๐ ๐ข๐ง๐ง๐๐ซ๐ฌ
๐๐ก๐๐ญ ๐ข๐ฌ ๐-๐๐๐๐ง๐ฌ?
Itโs an unsupervised machine learning algorithm that automatically groups your data into K similar clusters without labels. It finds hidden patterns using distance-based similarity.
๐๐ง๐ญ๐ฎ๐ข๐ญ๐ข๐ฏ๐ ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐:
You run a mall. Your data has:
โบ Age
โบ Annual Income
โบ Spending Score
K-Means can divide customers into:
โคท Budget Shoppers
โคท Mid-Range Customers
โคท High-End Spenders
๐๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ:
โ Choose the number of clusters K
โก Randomly initialize K centroids
โข Assign each point to its nearest centroid
โฃ Move centroids to the mean of their assigned points
โค Repeat until centroids donโt move (convergence)
๐๐๐ฃ๐๐๐ญ๐ข๐ฏ๐:
Minimize the total squared distance between data points and their cluster centroids
๐ = ฮฃโ๐ฑแตข - ฮผโฑผโยฒ
Where ๐ฑแตข = data point, ฮผโฑผ = cluster center
๐๐จ๐ฐ ๐ญ๐จ ๐ฉ๐ข๐๐ค ๐:
Use the Elbow Method
โคท Plot K vs. total within-cluster variance
โคท The โelbowโ in the curve = ideal number of clusters
๐๐จ๐๐ ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐ (๐๐๐ข๐ค๐ข๐ญ-๐๐๐๐ซ๐ง):
from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
model = KMeans(n_clusters=2, random_state=0)
model.fit(X)
print(model.labels_)
print(model.cluster_centers_)
๐๐๐ฌ๐ญ ๐๐ฌ๐ ๐๐๐ฌ๐๐ฌ:
โคท Customer segmentation
โคท Image compression
โคท Market analysis
โคท Social network analysis
๐๐ข๐ฆ๐ข๐ญ๐๐ญ๐ข๐จ๐ง๐ฌ:
โบ Sensitive to outliers
โบ Requires you to predefine K
โบ Works best with spherical clusters
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A